24 research outputs found

    Glottal Source Cepstrum Coefficients Applied to NIST SRE 2010

    Get PDF
    Through the present paper, a novel feature set for speaker recognition based on glottal estimate information is presented. An iterative algorithm is used to derive the vocal tract and glottal source estimations from speech signal. In order to test the importance of glottal source information in speaker characterization, the novel feature set has been tested in the 2010 NIST Speaker Recognition Evaluation (NIST SRE10). The proposed system uses glottal estimate parameter templates and classical cepstral information to build a model for each speaker involved in the recognition process. ALIZE [1] open-source software has been used to create the GMM models for both background and target speakers. Compared to using mel-frequency cepstrum coefficients (MFCC), the misclassification rate for the NIST SRE 2010 reduced from 29.43% to 27.15% when glottal source features are use

    Improving speaker recognition by biometric voice deconstruction

    Get PDF
    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions

    Glottal-Source Spectral Biometry for Voice Characterization

    Get PDF
    The biometric signature derived from the estimation of the power spectral density singularities of a speaker’s glottal source is described in the present work. This consists in the collection of peak-trough profiles found in the spectral density, as related to the biomechanics of the vocal folds. Samples of parameter estimations from a set of 100 normophonic (pathology-free) speakers are produced. Mapping the set of speaker’s samples to a manifold defined by Principal Component Analysis and clustering them by k-means in terms of the most relevant principal components shows the separation of speakers by gender. This means that the proposed signature conveys relevant speaker’s metainformation, which may be useful in security and forensic applications for which contextual side information is considered relevant

    Bio-inspired Dynamic Formant Tracking for Phonetic Labelling

    Get PDF
    It is a known fact that phonetic labeling may be relevant in helping current Automatic Speech Recognition (ASR) when combined with classical parsing systems as HMM's by reducing the search space. Through the present paper a method for Phonetic Broad-Class Labeling (PCL) based on speech perception in the high auditory centers is described. The methodology is based in the operation of CF (Characteristic Frequency) and FM (Frequency Modulation) neurons in the cochlear nucleus and cortical complex of the human auditory apparatus in the automatic detection of formants and formant dynamics on speech. Results obtained informant detection and dynamic formant tracking are given and the applicability of the method to Speech Processing is discussed

    Relevance of the glottal pulse and the vocal tract in gender detection

    Get PDF
    Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed that better detection rates are reached when glottal source and vocal tract parameters are used in a gender-balanced database of running speech from 340 speakers

    Using dysphonic voice to characterize speaker's biometry

    Get PDF
    Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading

    A Hybrid Parameterization Technique for Speaker Identification

    Get PDF
    Classical parameterization techniques for Speaker Identification use the codification of the power spectral density of raw speech, not discriminating between articulatory features produced by vocal tract dynamics (acoustic-phonetics) from glottal source biometry. Through the present paper a study is conducted to separate voicing fragments of speech into vocal and glottal components, dominated respectively by the vocal tract transfer function estimated adaptively to track the acoustic-phonetic sequence of the message, and by the glottal characteristics of the speaker and the phonation gesture. The separation methodology is based in Joint Process Estimation under the un-correlation hypothesis between vocal and glottal spectral distributions. Its application on voiced speech is presented in the time and frequency domains. The parameterization methodology is also described. Speaker Identification experiments conducted on 245 speakers are shown comparing different parameterization strategies. The results confirm the better performance of decoupled parameterization compared against approaches based on plain speech parameterization

    Articulatory Feature Detection based on Cognitive Speech Perception

    Get PDF
    Cognitive Speech Perception is a field of growing interest as far as studies in cognitive sciences have advanced during the last decades helping in providing better descriptions on neural processes taking place in sound processing by the Auditory System and the Auditory Cortex. This knowledge may be applied to design new bio-inspired paradigms in the processing of speech sounds in Speech Sciences, especially in Articulatory Phonetics, but in many others as well, as Emotion Detection, Speaker’s Characterization, etc. The present paper reviews some basic facts already established in Speech Perception and the corresponding paradigms under which these may be used in designing new algorithms to detect Articulatory (Phonetic) Features in speech sounds which may be later used in Speech Labelling, Phonetic Characterization or other similar tasks

    Glottal Biometric Features: Are Pathological Voice Studies appliable to Voice Biometry?

    Get PDF
    The purpose of the present paper is to introduce a methodology successfully used already in voice pathology detection for its possible adaptation to biometric speaker characterization as well. For such, the behavior of the same GMM classifiers used in the detection of pathology will be exploited. The work will show specific cases derived from running speech typically used in NIST contests against a Universal Background Model built from the population of normophonic subjects in specific vs general evaluation paradigms. Results are contrasted against a set of impostors derived from the same population of normophonic subjects. The relevance of the parameters used in the study will also be discusse
    corecore